Image retrieval from scientific publications: Text and image content processing to separate multipanel figures

نویسندگان

  • Emilia Apostolova
  • Daekeun You
  • Zhiyun Xue
  • Sameer K. Antani
  • Dina Demner-Fushman
  • George R. Thoma
چکیده

Images contained in scientifi publications are widely considered useful for educational and research purposes, and their accurate indexing is critical for efficient and effective retrieval. Such image retrieval is complicated by the fact that figure in the scientifi literature often combine multiple individual subfigure (panels). Multipanel figure are in fact the predominant pattern in certain types of scientifi publications. The goal of this work is to automatically segment multipanel figures— necessary step for automatic semantic indexing and in the development of image retrieval systems targeting the scientifi literature. We have developed a method that uses the image content as well as the associated figur caption to: (1) automatically detect panel boundaries; (2) detect panel labels in the images and convert them to text; and (3) detect the labels and textual descriptions of each panel within the captions. Our approach combines the output of imagecontent and text-based processing steps to split the multipanel figure into individual subfigure and assign to each subfigur its corresponding section of the caption. The developed system achieved precision of 81% and recall of 73% on the task of automatic segmentation of multipanel figures

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

A Modified Grasshopper Optimization Algorithm Combined with CNN for Content Based Image Retrieval

Nowadays, with huge progress in digital imaging, new image processing methods are needed to manage digital images stored on disks. Image retrieval has been one of the most challengeable fields in digital image processing which means searching in a big database in order to represent similar images to the query image. Although many efficient researches have been performed for this topic so far, t...

متن کامل

Semiautomatic Image Retrieval Using the High Level Semantic Labels

Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...

متن کامل

Using Text Surrounding Method to Enhance Retrieval of Online Images by Google Search Engine

Purpose: the current research aimed to compare the effectiveness of various tags and codes for retrieving images from the Google. Design/methodology: selected images with different characteristics in a registered domain were carefully studied. The exception was that special conceptual features have been apportioned for each group of images separately. In this regard, each group image surr...

متن کامل

Directional Stroke Width Transform to Separate Text and Graphics in City Maps

One of the complex documents in the real world is city maps. In these kinds of maps, text labels overlap by graphics with having a variety of fonts and styles in different orientations. Usually, text and graphic colour is not predefined due to various map publishers. In most city maps, text and graphic lines form a single connected component. Moreover, the common regions of text and graphic lin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JASIST

دوره 64  شماره 

صفحات  -

تاریخ انتشار 2013